P  '  AD -  A  1 22  661  LOGARITHMIC  DEPTH  CIRCUITS  FOR  ALGEBRAIC  FUNCT  I ONSU^^ 
f  ^  HARVARD  UNI V  CAMBRIDGE  MA  AIKEN  COMPUTATION  LAB  J  RE  I F 

NOV  82  TR-35-82  N0OO14-8O-C-0674 


UNCLASSIFIED 


F/G  12/1 


MICROCOPY  RESOLUTION  TEST  CHART 

NATIONAL  BURUU  Of  STANDARDS  19b t  a 


iii  tfUii 


StCuniTv  CL ASSl  FIXATION  or  This  PAGE  (Wh Ml  n»>«  Fnttrrdl 


REPORT  DOCUMENTATION  PAGE 


.  RE  PON  t  NUMBER 


4.  TiTLE  (tnd  Submit) 


READ  INSTRUCTIONS 
DEFORE  COMPLETING  FORM 


»•  RECIPIENT'S  CATALOG 


Logarithmic  Depth  Circuits  for  Algebraic- 
Functions 


7.  AUTHORfaJ 

John  Reif 


*.  PERFORMING  ORGANIZATION  NAME  ANO  ADDRESS 

Harvard  University 
Cambridge ,  (1A 


I.  CONTROLLING  OFFICE  NAME  ANO  AOORESS 


•  •  PERFORMING  ORG.  REPORT  NUMBER 

TR- 35-82 


contract  or  grant  number^*; 

N00014-80-C-0674 


•0.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  A  WORK  UNIT  NUMBERS 


Office  of  Naval  Research 
800  North  Quincy  Street 
Arlington,  VA  22217 


L  MONITORING  AGENCY  NAME  4  AODRESSflf  dillirmnt  from  Controlling  O/ffcaJ  IS.  SECURITY  CLASS.  (»l  rMa  raportj 

same  as  above 


12.  REPORT  DATE 

November,  1982 


IJ.  NUMBER  OF  PAGES 

20 


t*.  DISTRIBUTION  STATEMENT  f ilDili  JUporfJ 

unlimited 


!  £4 


.  ..  approved. 

*ir  d  sale;  its 

_ _ 


If.  KEY  WORDS  (Continue  on  rovorso  *M*  it  nmcootmry  md  Idontify  by  block  rum bor) 

polynomials,  interpolation,  product, division,  elementary  functions, 
power  series,  circuit  depth,  algebraic  computation,  evaluation. 


20.  ABSTRACT  fConrfnu*  on  tovf  otdm  It  nccommmry  md  Idontlly  by  block  numbmt) 


see  reverse  side 


DO  |  JAN*]  1473  COITION  OF  I  NOV  «t  IS  OBSOLETE  A  9  j 

i/R  0102*014*4401  t  mm  mr  J 


security  classification  op  this  page  rwT>««  o*<« 


i 


This  paper  describes  circuits  for  computation  of  various  algebraic  functions 


on  polynomials,  power  series,  integers,  and  reals. 

bet  df(x]  be  the  polynomials  and  power  series  over  a  commutative  ring 
which  supports  a  fast  Fourier  transform  and  let  ^IxJ  be  the  polynomials  and 
power  series  over  the  rationals  id. 

For  polynomials  of  degree  n-1,  we  give  circuits  of  depth  O(log  n)  for 
computing 

—  the  m-th  pouer  of  a  polynomial  and  the  product  of  m  polynomials  in 
m  x] ,  where  m-O(n) 

—  the  syirmetric  functions  on  <#1x1 

—  the  remainder  and  quotient  of  division  of  polynomials  in  #Ix] 

—  inter[>olation  of  a  polynomial  in  j?[x] . 

For  power  series  with  n  given  low  order  terms,  we  give  circuits  of  depth 
O(log  n)  for  computing  the  first  n  low  order  terms  of 

—  the  m-th  paver  of  a  power  series  in  ;#[x]  and  the  [  roduct  of  m  power 
series  in  ,  where  m  =  0(n) 

—  the  composition  of  power  series  in  ,#[x] 

—  the  reciprocal  of  a  power  series  and  the  division  of  two  power  series 
in  dlx] 

—  the  reversion  of  a  power  series  in  d\x\ 

—  various  elementary  functions  applied  to  power  series  in  #[x)  such  as 
(fixed)  powers,  roots,  exponentation ,  logarithm,  sin,  cos,  arctangent, 
and  hyperbolic  cosine. 

For  integers  represented  by  n  bit  binary  numbers,  we  give  boolean  circuits 
(whose  gates  compute  the  boolean  operations  a,  v,  and  -|)  of  depth 
0(log  n(loglog  n)2)  for  computing: 

(Continued  on  next  page) 
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Logarithmic  Depth  Circuits  for  Algebraic  Functions 


—  the  m -th  paver  of  an  integer  and  the  product  of  m  integers ,  where 
m  >0(n) 

—  the  remainder  and  quotient  of  the  division  of  two  integers. 

For  reals  on  a  finite  interval  [a,bl  represented  as  floating  point  numbers 
within  relative  accuracy  o (2  n) ,  we  give  boolean  circuits  of  depth  0(log  n) 
(loglog  n)2)  for  computing  within  relative  accuracy  o(2  n) : 

—  the  m-th  paver  of  a  real  and  the  product  of  m  reals  where  m*0(n) 

—  the  reciprocal  of  a  real  and  division  of  reals 

—  the  various  elementary  functions  on  reals. 

As  a  consequence  of  the  above,  for  polynomials  and  power  series  in  ^>[xj 

2 

we  have  uniform  boolean  circuits  of  depth  O(log  n(loglog  n)  )  for  all  the 
above  listed  problems  for  polynomials  and  power  series,  and  also: 

—  evaluation  of  a  polynomial  or  power  series  in  j?[x]  at  n  points, 
within  relative  accuracy  o(2  n) . 

All  our  circuits  may  be  uniformly  constructed  by  a  deterministic  Turing 

machine  with  space  0(log  n) .  The  best  circuit  depth  previously  known  for  any 

2 

of  the  above  problems  was  fl(log  n)  . 


0. 


ABSTRACT 


This  paper  describes  circuits  for  computation  of  various  algebraic  functions 

on  polynomials,  power  series,  integers,  and  reals. 

K 

Let  0?[x]  be  the  polynomials  and  power  series  over  a  commutative  ring 
which  supports  a  fast  Fourier  transform  and  let  jSlx]  be  the  polynomials  and 


(x 


power  series  over  the  rationals  LX 

For  polynomials  of  degree  n-1,  we  give  circuits  of  depth  0(log  n)  for 
computing 

—  the  m-th  power  of  a  polynomial  and  the  product  of  m  polynomials  in 

ft 

,  where  m  =  0(n) 

fx 

—  the  symmetric  functions  on  <55?  [x] 

A 

—  the  remainder  and  quotient  of  division  of  polynomials  in  ^?[x] 

.  . 

—  wterp  station  ot  a  polynomial  in  Six]. 

For  power  series  with  n  given  low  order  terms ,  we  give  circuits  of  depth 
O(log  n)  for  computing  the  first  n  low  order  terms  of 

f 

—  the  m-th  power  of  a  power  series  in  «5?[x]  and  the  product  of  m  power 
series  in  ^?[x]  where  m  =  0(n) 

K 

—  the  composition  of  power  series  in  &?[x] 

—  the  reciprocal  of  a  power  series  and  the  division  of  two  power  series 
in  Slx'f 


a 

—  the  reversion  of  a  power  series  in  felx] 

—  various  elementary  functions  applied  to  power  series  in  &[x]  such  as 
(fixed)  powers,  roots,  exponentation ,  logarithm,  sin,  cos,  arctangent, 
and  hyperbolic  cosine. 

For  integers  represented  by  n  bit  binary  numbers,  we  give  boolean  circuits 

(whose  gates  compute  the  boolean  operations  A,  v,  and  -j)  of  depth 
2 

0(log  ndoglog  n)  )  for  computing: 
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—  the  m -th  power  of  an  integer  and  the  product  of  in  integers ,  where 
m  =  0(n) 

—  the  remainder  and  quotient  of  the  division  of  two  integers. 

For  reals  on  a  finite  interval  ta,b]  represented  as  floating  point  numbers 
within  relative  accuracy  o (2  n)  ,  we  give  boolean  circuits  of  depth  O(log  n) 

(loglog  n)“)  for  computing  within  relative  accuracy  o(2  n) : 

—  the  m-th  power  of  a  real  and  the  product  of  m  reals  where  m  =  0(n) 

—  the  reciprocal  of  a  real  ’and  division  of  reals 

—  the  various  elementary  functions  on  reals. 

As  a  consequence  of  the  above,  for  polynomials  and  power  series  in  i?[x] 

2 

we  have  uniform  boolean  circuits  of  depth  O(log  ndoglog  n)  )  for  all  the 
above  listed  problems  for  polynomials  and  power  series,  and  also: 

—  evaluation  of  a  polynomial  or  power  series  in  j?[x]  at  n  points, 
within  relative  accuracy  o<2  n) . 

All  our  circuits  may  be  uniformly  constructed  by  a  deterministic  Turing 

machine  with  space  O(log  n) .  The  best  circuit  depth  previously  known  for  any 

2 

of  the  above  problems  was  iHloq  n)  . 

I .  INTRODUCTION 

Much  research  is  now  done  on  parallel  algorithms,  although  in  fact  at  this 
time  most  current  computers  contain  only  a  single  processor.  However,  most 
computers  do  use  parallel  circuits  to  implement  the  most  basic  and  often  repeated 
operations,  such  as  the  arithmetic  operations:  addition,  subtraction,  multiplica¬ 
tion  and  division.  These  operations  are  generally  applied  to  integers  with  an 
n  bit  binary  representation,  and  to  floating  point  reals  with  relative  accuracy  2  n. 
Other  frequently  used  repeated  operations,  which  certainly  would  merit  special 
purpose  circuits,  are  the  elementary  functions  such  as  sin,  cosine,  arctangent, 
exponentation ,  logarithm,  square  roots,  and  fixed  powers. 


d 
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The  depth  of  a  circuit  is  the  time  for  its  parallel  execution.  What  is  the  minimum 
depth  of  boolean  circuits  for  these  arithmetic  operations  and  elementary  functions? 

For  integer  addition,  lOfman,  62],  [Krapchenko,  67]  and  [Ladner  and  Fischer, 

80]  give  boolean  circuits  of  depth  0(log  n)  and  size  0(n) .  Subtraction  circuits 
with  the  same  asymptotic  depth  and  size  can  easily  be  gotten  from  these  addition 
circuits. 

For  integer  multiplication,  [Ofman,  62]  and  [Wallace,  64]  give  boolean  circuits 
of  depth  O(log  n) ,  and  [Schonhage  and  Strassen,  71]  also  achieve  depth  0(log  n) 
with  simultaneous  size  O(n(log  n}loglog  n) . 

2 

For  division,  best  known  boolean  circuit  depth  was  ftdog  n)  .  [Anderson, 

et  al. ,  67]  first  gave  such  a  circuit  (which  incidentally  was  implemented  by  them 

on  the  IBM/360  Model  91  Floating-Point  Execution  Unit).  [Knuth,  69]  and  [Aho, 

Hopcroft  and  Ullman,  74]  describe  a  division  circuit  attributed  to  Steve  Cook  of 
2 

depth  (log  n)  and  size  0(n  log  n  loglogn). 

The  best  known  boolean  circuit  depth  for  the  elementary  functions  was 
2 

ft(log  n)  [Brent,  76],  [Kung,  76]. 

2 

Many  of  the  above  mentioned  boolean  circuits  of  depth  ft(log  n)  use  a  second 

order  Newton  iteration  with  ft (log  n)  steps,  each  requiring  an  n-bit  integer 

multiplication  with  ft(log  n)  depth.  Alternatively,  a  reduction  is  made  to  the 

problem  of  computing  the  m-th  power  of  a  n-bit  integer  modulo  2n+l  for  m  =  0(n). 

This  is  naively  computed  by  ft(log  n)  steps  of  repeated  squaring,  where  each 

square  computation  requires  ft(log  n)  depth. 

This  paper  gives  a  uniform  boolean  circuits  of  depth  0(log  ndoglog  n)2) 

for  the  problem  of  computing  the  product  of  m  n-bit  integers  modulo  (2n+l) . 

2 

From  this  result,  we  get  uniform  boolean  circuits  of  depth  0(log  ndoglog  n)  ) 
for  the  problems  of  division  and  computing  elementary  functions,  among  others. 


[Borodin,  77]  proved  that  if  a  function  f  is  computed  in  uniform  boolean 
circuit  depth  d(n)  > log  n,  then  f  can  be  computed  by  a  deterministic  Turing 


Machine  with  space  d(n).  Thus  division  and  the  elementary  functions  can  be 

2 

computed  in  deterministic  space  O(log  ndoglog  n)  ) .  Note  that  as  an  amusing 

consequence,  we  have  that  for  any  n>0  the  first  n  digits  of  it,  Euler's 

constant  e,  and  the  golden  ratio  ip  can  all  be  computed  by  uniform  boolean 

circuits  of  depth  O(log  n(loglog  n)2),  and  hence  can  be  computed  in  det  rministic 

2 

space  Odog  ndoglog  n)  ). 

An  essential  technique  in  the  construction  of  our  product  circuit  is  the  use 
of  negatively  wrapped  convolutions,  which  can  be  computed  in  boolean  depth 
O(log  n)  by  the  fast  Fourier  transform  of  [Cooley  and  Tukey,  65).  This  tech¬ 
nique  was  first  introduced  by  [Schonhage  and  Strassen,  71]  for  the  multiplication 
of  two  integers.  Our  innovation  was  to  generalize  the  technique  to  products  of 
more  than  two  integers. 

Our  Lecluilques  are  best  understood  first  in  the  context  of  polynomi  aU  *nd 

power  series  in  say  i?[x] .  In  fact,  this  context  is  interesting  in  itself.  We 

might  envision  a  special  purpose  computer  designed  for  algebraic  computation.  Its 

data  are  (coefficients  of)  polynomials  and  power  series.  The  arithmetic  operations 

including  division  of  polynomials  and  power  series  are  elementary  operations  of 

our  "algebraic  computer."  Also,  frequently  applied  operations  are  the  composition 

of  power  series,  reversion  of  a  power  series,  computation  of  elementary  functions 

applied  to  power  series,  and  interpolation  of  polynomials. 

Section  2  gives  circuits  of  depth  O(log  n)  that  for  all  these  polynomial 

and  power  series  operations,  where  each  gate  of  the  circuits  computes  an  addition, 

multiplication,  or  a  division  of  two  rationals.  In  the  case  the  polynomials  and 

power  series  have  rational  coefficients,  then  we  have  boolean  circuits  of 
2 

Odog  ndoglog  n)  )  depth  for  all  these  polynomial  and  power-series  operations. 
Furthermore,  we  can  also  evaluate  the  resulting  polynomials  and  power  series 
within  accuracy  o(2  n)  by  boolean  circuits  with  depth  O(log  ndoglog  n)2). 
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2.  CIRCUITS  FOR  POLYNOMIAL  AND  POWER  SERIES  COMPUTATIONS 

2.0  Circuit  Definitions 

A  circuit  oi^  over  a  commutative  ring  ^P=  (£?,+  ,  *,0,1)  is  an  acyclic 
labeled  digraph,  with 

(i)  a  list  of  N  distinguished  input  nodes  that  have  no  entering  edges 

(ii)  constant  nodes  with  indegree  0  and  labeled  with  constants  in  ® 

(iii)  internal  nodes  with  indegree  two  and  labeled  with  the  symbols  in 
{"+••,  "•"} 

(iv)  a  list  of  l  distinguished  output  nodes. 

Given  an  assignment  of  the  input  nodes  from  domain  Q,  the  value  of  the 

circuit  at  the  output  nodes  is  gotten  by  evaluation  of  the  gates  in  topological 

order.  The  circuit  a  thus  defines  a  mapping  from  to  £?*-.  A  circuit 

N 

aN  over  the  rationals  &  is  similarly  defined,  except  the  nodes  can  also 
compute  division. 

Let  f  be  a  function  of  (the  coefficients  of)  m  p'olynomials  p,(x),...,p  (x) 

1  m 

in  ^P[x]  of  degree  n-1.  A  circuit  for  f  has  N  =  mn  inputs,  namely 

the  list  of  N  coefficients  in  2s  of  the  given  polynomials.  The  output  nodes 

of  give  the  list  of  coefficients  of  f (p^(x) ,. . . ,pm<x) ) .  If  on  the  other 

hand  f  is  a  function  of  m  power  series  p,  (x),...,p  (x)  in  iftlx]  each  with 

1  m 

n  given  low  order  coefficients,  then  the  circuit  cl.  for  f  also  has  N = nm 

N 

inputs,  and  the  output  nodes  of  only  give  some  prescribed  finite  number  of 

the  coefficients  of  (the  possibly  infinite)  power  series  f(p  (x),...,p  (x) ) . 

1  m 

The  depth  of  circuit  is  the  length  of  its  longest  path.  A  function  f 

over  polynomials  or  power  series  in  R  has  simultaneous  depth  0(d(:0)  and 

size  0 (S (N ) )  if  3  an  infinite  family  of  circuits  a, , . . . ,a„, . . .  and  constants 

1  N 

c, ,  c,  ^  1  such  that  VN^l,  a  has  depth  not  more  than  c,d(N)  and  size  not 
1  ^  N  1 

more  than  c^SfN)  and  given  N  input  coefficients  of  the  input  polynomial  or 


...  t* 
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power  series,  aN  computes  f  within  the  prescribed  number  of  coefficients. 

All  the  circuits  considered  in  this  paper  are  uniform  in  the  sense  of 
[Borodin,  77];  they  may  be  constructed  in  space  O(log  N)  by  a  deterministic 
Turing  Machine. 

2.1  The  Discrete  Fourier  Transform 

Fix  a  commutative  ring  (3>,  +,*,0,1).  We  assume  u)  is  the  principle 

N 

N-th  root  of  unity  in  0t .  Given  a  vector  a  £  ,  the  Discrete  Fourier  Trans¬ 

form  is 

DFT  (a)  =  Aa 
N 

where  A.  .  =  for  0<i,  j  <N.  We  assume  N  has  a  multiplicative 

inverse  and  let  A.  .  a  ~  to  1^.  The  inverse  Discrete  Fourier  Transform  is 
lg  N 

DFT.  1  (a)  =  A  *a  and  obviously  satisfies  DFT.,\'DFT  (a))  =  a.  [Cooley  and  Tukcy, 

N  N  N 

65]  gave  the  Fast  Fourier  Transform  for  which 

theorem  2.1.  dft.,  and  DFT.”1  over  ft  have  simultaneous  depth  0(log  N) 

N  N 

and  size  0(N  logN) . 

(Note  given  a  vector  a£2n  ,  where  n<N,  DFT^fa)  will  be  defined  to  be 

DFT  (a+)  where  a+  is  che  vector  of  length  N  derived  by  concatenating  a 
N 

with  N-n  zeros.) 


2.2  Products  of  Polynomials 


Suppose  we  are  given  m  vectors  a^  €(7  for  i  =  l,...,m.  Each  vector 
T 

a.  =  (a.  ,  ...,a,  )  gives  the  coefficients  of  a  n-1  degree  polynomial 

x  i f u  i f n*i 

n-l 

A.  (x)  =  X  a.  .  x'3  in  ftlx] .  Let  N  =  nm.  We  wish  to  compute  the  product 
1  j=0  10 

J  N-l  k  m 

polynomial  B(x)  =  I  b  x  ,  where  B(x)  =  FI  A.  (x) .  (Note  that  we  have 
k=0  K  i=l  1 

b,  =0  for  N  -  m+  1  <k  <N  -  1.) 
k 
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In  the  special  case  m  =  2  and  N=2n,  the  convolution  vector 
b  =  (bg , . .  •  ,bN_^)  =  a^®a^  gives  the  coefficients  of  B(x).  By  the  Convolution 

Theorem: 

®  ~  DFTn 1 ( dftn ( a1 ) > dftn ( a2 ) )  where  •  denotes  pairwise  product. 

Hence  the  well-known  result  that 

THEOREM  2.2.  The  product  of  two  polynomials  in  &lx]  of  degree  n-l 
has  simultaneous  depth  o{log  n)  and  size  0(n  logn). 

In  the  case  of  general  m^2,  we  wish  to  compute  the  coefficient  vector 

b  =  (b_, . . .  ,b  )T  =  a  ®  ...  ®  a  . 

O  N-l  1  m 

By  repeated  application  of  the  Convolution  Theorem,  we  get 
LEMMA  2.1.  b  =  DFT~1{DFTN(al)...DFTN(aN_1))  . 


Thus  we  first  compute  in  parallel  for  i  =  l,...,m  f.  =  DFT  (a.),  where 

1  NX 

T 

f,  «  (f .  n, —  ,f.  .  Next  we  compute  in  parallel  for  j  =  l,...,m  the 

m 

elementary  products  F.  =  T1  f.  ..  Finally,  we  compute  DFT ( (F  .... ,F  .)T). 

3  ■  1  1,3  0  N-l 

-1 

Since  the  computation  of  DFT,  DFT  and  the  required  products  F.,  each 

N  N  3 

have  depth  O(log  N) ,  we  have: 


THEOREM  2.3.  The  product  of  m  polynomials  in  &[*.)  of  degree  n-l 
has  depth  O(log(nm)). 

(Note  that  the  naive  method  of  repeated  squaring  by  Theorem  2.2  has 
depth  log  (m)  log  (n ) ) . 


2.3  Modular  Products  of  Polynomials 


n 


Let  B(x)  =  n  A. (x)  be  the  product  polynomial  considered  in  the  previous 
i=l  1  n-l 

section.  Here  we  consider  the  computation  of  the  modular  product  D(x)  =  I  d.x1 

i=0  1 

where  D(x)  =B(x)  mod  (xn+l) . 


-8- 


in-1 


LEMMA  2.2.  The  coefficients  of  D(x)  are  d.  =  I  (~l)rb  for 

r=0 


i  *  0, . . .  ,n-l. 

For  proof,  see  the  Appendix. 

We  assume  u  is  the  principle  nth  root  of  unity  in  3i,  and  n  has  a 

2 

multiplicative  inverse.  Ke  also  assume  there  exists  an  tjj  €  Q>  such  that  = 


U). 


Then  ij/*  - -1.  Let  a.  =  (a.  _4'a.  , . . . ,^n_1a.  )T.  The  negatively  wrapped 

X  X  X  /  X  X /H"! 


convolution  of  a,, _ ,a  is 

l  m 


a=  (dn4'd  ,...,^n-1d  .)T 

u  i  n-i 


In  the  Appendix  we  prove: 


-1, 


LEMMA  2.3.  d  =  DFT  (DFT  (a,) •••DFT  (a  ) ) . 

n  n  1  n  m 


The  above  Lemmas  2.2,  2.3  and  Theorem  2 . 1  imply : 


inr.vjrx.i-,  x.*,. 


TTt„  „ 

L  UC-  KiVUUvUi 


*vy»/vWi,a+ 


/IS  /v\ 
\«»2  %«•/ 


»  7^  f  ^  /  v  4-1  \ 

“m % . . 


"i  7  *  ,  v  r'vr-  /•>  7  c 


A  (x)  , •  •  •  ,A  (x)  tn  ^Plx]  decree  n-1  fcas  sirrr.iltar.eous  depth  Odog(nm))  and 

i  *  m  v 

size  o (nm  log  (nn) )  .  The  modular  power  A(x)m  nod(xn+l)  cf  a  single  polynomial 
A(x)  of  degree  n-1  has  simultaneous  depth  0{log(nm))  and  size  0(nlog(nm)). 


2.4  Elementary  Functions  on  Power  Series 


An  immediate  consequence  of  Theorem  2.3  is 

corollary  2.1.  The  composition  of  two  power  series  in  d?[x]  has  depth 
O(log  n) . 

The  elementary  functions  exp(x) ,  log(x),  sin(x),  cos(x),  arctan(x),  and 
square  root(x),  etc.  all  have  known  Taylor  series  expansions  convergent  over 
given  intervals.  Thus  by  Corollary  2.1  we  have: 


COROLLARY  2.2.  The  elementary  functions  on  £lx]  have  depth  0(log  n) . 


For  some  given  x, , . .  E  Q  it  is  frequently  useful  in  algebraic  compu- 
1  N  H  N 


tations  to  determine  the  polynomial  T1  (x-x.)  =  I  (-l)^p.x3  whose  coefficients 

i«i  1  s=n  3 
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\<V 


, .  X. 
■<!_,  i. 


are  the  elementary  symmetric  functions.  It  was 

j 


i  i 

pointed  out  to  us  by  Les  Valiant  that  Theorem  2.3  immediately  implies 

COROLLARY  2.3.  The  elementary  symmetric  functions  in  3t[x]  have  depth 
O(log  N) . 


2.5  Power  Series  and  Polynomial  Division 

n-l  . 

Let  A(z)  =  Z  a.z  be  a  power  series  in  ^[x] .  The  reciprocal  of  A(z) 
i=0  1 

OO 

is  the  power  series  I(z)  =  Z  r.z1  such  that  A(z)*I(z)  =1.  I(z)  has  the 

i=0  1 

infinite  series  expansion 

OO 

I(z)  =  Z  ( 1-A (z) ) 1  . 

i=0 

n-l 

We  wish  to  compute  the  first  n  coefficients  of  I  (z) .  Since  I  (z)  =  Z  (1-A(z))1  + 

n  i=0 

o(z ‘)  ,  we  have  by  Theorem  2.3: 


corollary  2.4.  The  first  n  terns  of  the  reciprocal  of  a  power  series  and 

the  division  of  two  power  series  in  2[x\  can  be  computed  in  depth  o(log  n)  . 

An  alternative  method  using  the  lemma  below  results  in  a  circuit  of  depth 

O(log  n)  with  smaller  circuit  size. 

log(n+l)-l  _i 

LEMMA  2.4.  If  l(z)  =  IT  (1-(1-A(z))  )  then  }l(z)-5(z)|  =  o(zn) 

i=0 

for  z€(0,i)  and  A(z)>l-z. 

For  proof,  see  the  Appendix. 

In  the  Appendix  we  show  that  Corollary  2.4  implies: 

COROLLARY  2.5.  Given  polynomials  a(x),  b(x)  in  ^[x]  of  degree  at  most 
n,  we  can  compute  in  depth  O(log  n)  the  unique  polynomials  q(x),  r(x)  such 
that  a(x)  * q(x)b(x)  + r (x)  and  degree  <  ( r(x))  degree (b (x) ) . 


2.6  Polynomial  Interpolation 

COROLLARY  2.6.  Interpolation  of  a  polynomial  in  j?lx]  has  depth  O(log  n) . 


ffi 


2.7  Reversion  of  a  Power  Series 

In  the  Appendix  we  show  that  Theorem  2.3  and  Corollary  2.4  imply: 

"OROLLARY  2.7.  The  reversion  of  a  poser  series  in  ^?[x]  has  depth  o(log  n) 

3.  INTEGER  COMPUTATIONS 
3.0  Boolean  Circuits 

We  consider  computations  over  integers  given  as  n  bit  binary  numbers,  and 
reals  over  [0,1]  given  within  accuracy  2  n.  Our  computational  model  in  this 
section  is  the  boolean  circuit,  defined  as  usual.  The  i-th  input  node  of 
takes  the  i-th  bit  of  the  encoding  of  the  input  integer  or  real.  Each  gate  of 
computes  a  boolean  operation  v.  A,  or  -t.  Each  output  node  provides  a  bit 
of  the  encoding  of  the  computed  integer  or  real.  (In  the  case  of  reals  with 
floating  point  representation,  we  only  provide  the  input  and  output  bits  up  to 
some  finite  prescribed  accuracy.) 

3.1  The  DFT  over  an  Integer  Ring 

We  assume  n  and  u>  are  positive  powers  of  two.  Let  p  =  u>n^2  +  1 
and  let  be  the  ring  of  integers  modulo  p. 

PROPOSITION  3.1.  In  u  is  the  principle  nth  root  of  unity  and  n 

has  a  multiplicative  inverse  modulo  p. 

Proposition  3.1  implies  DFTn  4:1(3  DFT^1  are  defined. 

The  fast  Fourier  transform  computation  of  [Cooley  and  Tukey,  65)  yields  a 
arithmetic  circuit  of  depth  0(log  n)  and  size  O(nlogn)  computing  DFT^ 

whose  elements  require: 


(i)  addition  of  two  'logtpj'-bit  integers. 

(ii)  multiplication  of  a  'logtpT-bit  integer  by  a  power  of  w. 
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We  wish  to  expand  into  a  boolean  circuit.  Since  iii  is  a  power  of 

two,  the  multiplications  can  be  implemented  by  the  appropriate  bit  shifts 
(i.e.,  the  gate  connections  are  shifted  by  the  appropriate  amount).  The 
additions  can  be  implemented  by  Carry-Save  Add  circuitry  of  [Of man,  62]  and 
[Wallice,  64]  (also  see  [Savage,  76])  yielding  a  boolean  circuit  of  depth 
O(log(np))  and  size  0{np  log(np)).  Thus  we  have 

THEOREM  3.1.  DFT  and  DFT-1  over  inteaer  ring  have  simultaneous 

n  n  p 

boolean  depth  O(log(np))  and  size  o(np  log(np)). 

3.2  Products  of  Integers 

[Schonhage,  Strassen,  71]  have  shown: 

THEOREM  3.2.  The  product  of  two  K-bit  integers  has  simultaneous  boolean 
depth  O(log  N)  and  size  0{N  logN  loglog  N) . 

We  now  show: 

THEOREM  3.3.  Given  a  list  of  K-bit  integers  a,,..., a  ,  the  product 
_  1  m 

**  N  2 

(  TT  a.  )mod(2  +1)  has  boolean  depth  0  (log  (Nm)  (loglog  N)  ). 

i«l  1 

(Note  that  the  naive  method  of  repeated  squaring  by  Theorem  3.2  results 
in  a  boolean  circuit  of  depth  f.(log(m)log  N).) 

Proof.  In  the  case  m>I^81og  N)  we  do  the  computation  by  partitioning 

a  , ...,a  into  'm/N^2’  groups,  each  of  size  at  most  N1^2.  We  compute  the 
1  m 

product  of  all  the  elements  of  each  group  in  parallel  by  Odoglog  n)  iter¬ 
ations  of  a  method  described  in  the  proof  of  Lemma  3.1  below.  The  result  is 
1/2 

a  list  of  fm/N  ’  integers  of  N-bits  each. 

Our  resulting  boolean  circuit  for  product  will  have  depth  D(m,N).  It 
will  satisfy  the  recurrence 

D(m,N)  *  D(  fm/N1/21  ,N)  +  D(fN1//2,,N)  for  m>N/(8  1ogN). 


In  the  case  ra=l,  we  obviously  have 


D(m,N)  -  1  . 

We  will  prove  below: 

LEMMA  3.1.  We  can  construct  our  boolean  circuit  for  product  to  satisfy: 
D(m,N)  =  D(m,8'(Nm  logrc)1^1)  +  0{log  N) 
for  1  <m  <  N/(8  log  N) . 

Note  that  O(lcglog  N)  applications  of  the  recurrence  of  Lemma  3.1 
implies 

D('N1/2',N)  =  D( rN1/2’ ,16rN1/2log  N’J  +  Ollog  N  loglog  N)  . 

Solving  these  above  recurrences  we  get 

D(m,N)  =  O ( log (Nm) (loglog  N)2) 
fnr  all  T.-.  >  1 .  Thus  we  have  proved  Theorem  3.3. 


Proof  of  Lemma  3.1.  We  can  assume  we  are  given  N-bit  integers 


a  a  ,  where  m<N/(8  log  N) .  We  wish  to  compute  d=  b  mod(2  +1), 

i  m 


m 


where  b  *  T1  a.  . 

i*l  1 


1/2 


Fix  n  be  the  largest  power  of  two  not  more  than  8(Nm  log  m)  ,  and 


let  Z-  rN/n’ .  Each  number  a,  is  subdivided  into  n  "chunks”  a.  ,...,a 

i  _  ,  1-0  i.n- 

i.  n_^  i 

where  0<a.  .  <2  .  Then  define  the  polynomial  A.  (x)  =  I  a.  .xJ  such  that 

Z  3=° 

*i  =  Ai ( 2  ).  The  corresponding  product  polynomial  is 

nm-1  . 


B(x)  *  I  b.x  , 
i*0  1 


m 

where  B(x)  =  FT  A.  (x)  ; 

i=l  1 


n-1 


X,  1 

it  must  satisfy  b*B(2  ).  The  modular  product  polynomial  is  D(x)  =  I  d.x  , 

n  £  i=°  1 

where  D(x)  =  B{x)mod(x  +1);  it  satisfies  d *  D(2  ),  which  is  what  we  have 


to  compute. 

In  the  Appendix  we  prove: 

_  „  _  ,  ,i.i  „  „2m(£+l+log  n)  log  m 

PROPOSITION  3.2.  For  each  j-0,...,n-l,  |d^|<2 


Let  w  =  4  and  p=wn/^  +  l.  Then  by  Proposition  3.1,  the  integer  ring 

op  has  u>  as  the  principle  n-th  root  of  unity  and  n  has  a  multiplicative 
P 

inverse  modp.  Also,  we  define  I p  =  2.  Let  a.  =  (a.  „,ij/a.  , , —  ,ipn  *a.  ,)T 

1  i,0  i,l  i,n-l 

for  i=0,...,n-l.  By  Lemma  2.2,  the  coefficients  of  D(x)  are 
m-1 

ja 

d.  ~  Z  (-1)  ^nr+j  for  i  = 0, . . . ,n-l.  By  Proposition  3.1,  and  by  our  choice 

1  r=0  ^ 

of  n  we  have  |d.  |  <p/2  for  all  i*l,...,n-l.  Then  d  =  (d  ,i|jd  , . . .  ,^n  Ad 

i  01  n-1 

is  the  negatively  wrapped  convolution  of  the  coefficients  of  polynomials 

A  (x) , . . .  ,A  (x) .  To  compute  3,  in  parallel  for  i  =  l,...,m  we  compute  in 
1  m 

>\  T 

the  ring  gt  ,  DFT(a.)  =  (g.  ,...,g.  )  then  in  parallel  for  k*o,...,n-l 

p  _  l  i  r0  1  ,n-i 

m  _ 

A 

we  compute  e.  =  IT  g  mod  p,  and  finally  by  Lemma  2.3,  d  *  DFT  (e.  ,...,e 

x  x— 0  no  n- 

Since  ^  is  a  power  of  two,  we  can  easily  extract  d_,...,d  ,  from  d  in 

0  n-1 

depth  Odog  n) .  By  Theorem  3.1,  the  DFT^  anu  LFT^  computations  have 
depth  0(log  n) . 

^  y/2 

Note  that  since  p=u>  +1=2  +1  and  n  <  8  (Nm  log  m)  ,  the  recurrence 
claimed  in  Lemma  3.2  is  satisfied.  o 


3.3  Multiprecision  Evaluation  of  Polynomials  and  Power  Series 

Let  p(x)  be  a  polynomial  or  power  series  in  i?[x]  with  n-1  given 
rational  coefficients  of  magnitude  < 2n.  We  wish  to  evaluate  p(x)  at  a 
floating  point  real  xQ  within  relative  accuracy  o(2  n) .  By  Theorem  3.3  we 
have 

corollary  3.1.  The  evaluation  of  p(x)  at  a  given  xQ  to  relative 
accuracy  o(2-n)  has  boolean  depth  0(log  ndoglog  n)2). 

Since  the  elementary  functions  exp(x),  log(x) ,  sin(x),  cos(x),  arctan(x). 


square  root(x) ,  etc.  power  series  expansions  over  given  intervals,  we  have 


corollary  3.2.  The  evaluation  of  an  elementary  function  to  relative 
accuracy  o(2  )  has  boolean  depth  Odog  n(loglog  n)  ) . 

COROLLARY  3.3.  The  elementary  symmetric  functions  (see  Section  2.4) 
over  #fx]  have  boolean  depth  0{log  ndoglog  n)  ) . 

3.4  Reciprocals  and  Division  of  Integers 

Let  a  be  an  integer  within  bounds  2n  *<a<2n.  Then  a  has  binary 

representation  Z  a. 2*  where  a  ,  =  1.  The  reciprocal  of  a  is  2  ^r, 

00  i=0  A  n_1 

where  r  =  Z  r.  2~i.  We  wish  to  compute  the  first  n  bits  r  . ...,r 

i-0  1  0  n“1 

For  this,  we  can  use  the  product  form  of  {Anderson,  et  al.  ,  67]  and  {Savage,  76]. 

log(n+l)-l  «i 

LEMMA  3.3.  If  r  =  H  (l-(l-2  a)  )  then  |r-r|  =  o(2  ) . 

i=0 

By  Theorem  3.3  and  the  above  lemma,  we  get 

rnpnT.r.ARY  3.4.  The  reciprocal  can  be  computed  within  relative  accuracy 
o(2  n)  by  a  boolean  circuit  of  depth  O(log  ndoglog  n)2). 

COROLLARY  3.5.  Given  integers  a,  b  with  binary  representation  containing 
n  bits ,  ve  can  compute  in  boolean  depth  Odog  ndoglog  n)  )  the  division 
quotient  q  and  remainder  r  integers  such  that  a  =  qb  +  r  and  04r<b. 

Further  Results 

Our  results  for  Jflx]  can  be  extended  to  Euclidean  domains.  In  a  forth¬ 
coming  draft  of  this  paper,  we  improve  the  size  bounds  of  our  circuitry. 

Also,  we  can  reduce  our  boolean  depth  bound  for  products  in  Theorem  3.3  to 
O(log  N  loglog  N)  by  improving  Lenina  3.1  to  get  the  recurrence  D(m,  N)  = 

D(m,  m'log  N’ )  +  Odog  N)  for  m  <  N/(81og  N) . 
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APPENDIX 


Proof  of  Lemma  2.2. 


N-l 


m-1  N-l 


B(x) 


=  Z  bjx3  =  Z  Z  bnr+i 

j=0  J  r=0  i=0 


nr+i 


m-1 


=  (-l)rb  .x1  mod(x  +1) 

nr+i 

r=0 


.  r  _  nr  ..  n 
since  (-1)  =x  mod(x  +  1) . 


Proof  of  Lemma  2.3.  For  i =  1,. .. ,m  let  DFT(a. )  =  (g.  ,. . . ,g. 

-  !  lf0  i,n-l 


where 


n-l 

i  k  -  Z  ^ 

X'K  x»j 


j=0 

for  k  =  0,...,n-l.  Let 


-  ( S  «,  k)  -  E  *  ’l  -  3i  (  „ 

\i=l  '  /  0<j  , —  ,j  <n  \i=l  ,3i 

l  m 


l.  k(I.  ) 

h 

03 


Now  let  DFT  (d)  =  ( e e '  .)  .  Then  for  k  =  0,...,n-l  we  let 
n  o  n-x 


.  .Z  kZ 

e'  =  2w  u 

k  Jl=0 


n-l 

m-1 

z 

Z 

j£o 

r=0 

n-l 

m-1 

z 

z 

£=0 

r=0 

,Z  kZ, 


by  Lemma  2.2 


m 

n  a.  , 


<n  i=l  x,3j 

1  m 


nr+Z=I . 


A 


s  t= 


But  if  we  substitute  l  = 

IZ?.1  V  -nr 

into  the  above  expansion,  we  get 

,,  JL  ki ,  , ,  r 
4>  to  (-1) 

rj.  kV 

=  ip  1  01  1 

since  ipnr  =  (-l)r  and  ton 

=  1.  Hence  e' 
k 

=  V 

Proof  of  Lemma  2.4.  Let 

B  (z)  =  1  -  A  (z) . 

Then  A(z)I(z)  =  (1-B(z))l(z)  = 

1  -B(z)n+1  =  1  -  <1-A(z))n+1.  So 

|l(z)  -  I(z)  |  = 

(1-A(z))n+1 

A(z) 

< 

2(1  -  A(z))n+1 

since  A(z)  >  — 

< 

„  n+1 

2z 

since  z  >  1  -  A(z) 

s 

o(z") 

since  z  €  (0,  |-)  .  D 

Proof  of  Corollary  2.5.  (Also,  see  [Knuth,  81]).  Let  = degree (a (x) J 
and  n2  =  degree  (b(x) )  .  The  computation  is  trivial  unless  n^n^l.  Then 

w1 

A(z)  =  Q(z)B(z)+z  R(z) 


where 

n.  i  i  n  i 

A(z)  -  B(z)  =  zn2b(i)  ,  Q(z)  =  z  1  q(^) 

and  R(z)  =  z^^f-) . 

z 

Thus  to  compute  the  coefficients  of  q(x),  r(x)  we  compute  the  first 

w1 

n_-n  +1  coefficients  of  A(z)/B(z)  =  Q(z)  +0(z  ),  then  compute  the 

1  n ,-n  +1 

power  series  A(z)  -B(z)Q(z)  =  z  R(z)  ,  and  finally  output  the 

coefficients  of  Q{z) ,  R(z).  a 


Proof  of  Corollary  2.6.  Suppose  we  are  qiven  p.  p  (x)  polynomials 

1  m 

in  £lx)  each  of  degree  n-1,  and  polynomials  q^  (x)  , . . .  ,qjn(x) 


where 


A.  3 


m 

degree  (q.  (x)  )<  degree  (p,  (x) )  for  i  =  l,...,n.  Let  P  (x)  =  n  p.(x).  The 
11  i=l  1 

Chinese  Remainder  Theorem  states  that  there  is  a  unique  polynomial  Q(x)  of 

degree  less  than  that  of  P(x)  such  that  Q(x)  =q^(x)mod  p^ (x)  for 

i  ■  1, . . .  ,m. 

The  Lagrangian  interpolation  formula  gives 
m 

Q(x)  =  ^  q. (x) r. (x) s . (x)  mod  P (x) 
i=0  1  1  1 

where  s^  (x)  =  P(x)/p^(x)  and  r^(x)  is  the  multiplicative  inverse  of 
s. (x)mod  p. (x) . 

i  l 

Theorem  2.2  and  Corollary  2.5  imply  that  preconditioned  Chinese  remaindering , 

with  the  r, (x),...,r  <x)  also  given,  has  depth  O(log  n) . 

1  m 

However,  in  the  special  case  p^^  (x)  =  x-a^  for  i  =  l,...,m,  where  the  ai 
are  distinct  then  each  r^ (x)  = 1/s.^  (x)  can  be  computed  in  parallel  by 
Theorem  3.3  and  Corollary  2.5  in  depth  0(log  n) .  In  this  case  the  q^  (x)  =b^ 
are  constants,  since  they  must  have  degree  less  than  the  p^(x) . 

Further  note  that  in  this  case  Q(x )  is  the  unique  polynomial  such  that 
C(a^)  =b^  for  i  =  l,...,m.  Thus  we  have  proved  Corollary  2.6.  o 

00 

Proof  of  Corollary  2.7.  Let  A(x)  =  Z  a.x  be  a  power  series  in  QIx] 

i=0  1 

where  a^  =  o  and  a  *  1.  The  reversion  of  A(x)  is  the  power  series 
00  k 

R(z)  =  Z  r  z  where  z  =  A(x) .  Note  that  r  =0  and  r  =  1.  For  the 
k=0  '  01 

kth  coefficient,  we  first  compute 

00 

B(X)  =  aTxJ^  =  2*  biX  ' 

and  then  apply  Lagrange's  reversion  formula  [Lagrange,  1768]  r  = b  /k 

I* 

for  k>2.  Thus  Theorem  3.3  implies  Corollary  2.7.  o 


* 
* . 


